Bioinformatics (Thomas Dandekar, Meik Kunz)

330

20.8

When Does the Computer Stop Calculating?

Question 8.1

Here some algorithms are compared in terms of their computation time, it results:

(a) RNAfold with small RNA and large RNA (quadratic increase with sequence)

(b) BLAST search (grows linearly with search sequence and database)

Short peptide example, long protein example. Search in the NRDB database, and only in

the human sequences (use species option).

The E-value moves favorably downward, toward smaller values, for a smaller database.

Why? Well, the larger the database, the higher the probability of random hits. So the

expected value (E-value) for a random, non-biological, non-relevant hit gets higher. So the

better I can narrow down where I expect my hit to be (e.g. a species-specific database), the

more significant and meaningful my result will be.

(a) Protein folding

This is an NP-hard problem, which means that the computation time increases many

times with each additional amino acid. It is thus not at all clear how long the computer will

take (non-polynomial complex problem), but at least if you get a solution you can deter

mine in polynomial time how good it is. Nevertheless, protein structures can be predicted

for many practical purposes, e.g. by comparison with known structures, e.g. with SWISS-

MODEL (but already here the answer comes only by e-mail, it takes time), or somewhat

more precisely, but more computationally expensive, with MODELLER or actually “ab

initio”, i.e. from the sequence, by folding, calculated by the Zhang lab (with QUARK etc.).

Question 8.2

A nice answer is given by this Youtube video, but unfortunately it is in English:

https://www.youtube.com/watch?v=SC5CX8drAtU.

Here are compared:

Greedy strategy: locally optimal choice at each stage; at each stage visit an unvisited

city nearest to the current city. This heuristic need not find a best solution, but terminates

in a reasonable number of steps; finding an optimal solution typically requires unreason

ably many steps. In mathematical optimization, greedy algorithms solve combinatorial

problems having the properties of matroids (a structure that captures and generalizes the

notion of linear independence in vector spaces).

Local search strategy: Local search is a generic term for a number of metaheuristic

search methods in combinatorial optimization. The methods are used in many variations

to solve complicated optimization problems approximately (e.g., the traveling salesman

problem). The basic principle is to find a better solution starting from a given initial

20 Solutions to the Exercises